Math 560 Homework 08: Langrange Multipliers

---------------------------------------------------------
1 Consider the electricity-generation problem that we used,
min 3x^2 + 2y^2
s.t. x+y>=4, all variables non-negative.
In this problem, you'll be solving various versions of the NLP. I recommend
using a separate sheet for each, so you can keep a record of your work.
a) Find the solution using Excel Solver, and ask for a Sensitivity Report.
Then draw the situation at optimality, involving the obj.func. gradient vector, the
constraint gradient vector, and the constraint gradient vector scaled by the Lagrange Mult.
b) Change the demand constraint to >=4.1 and re-solve.
c) Compare the results to what was predicted by the Lagrange multiplier(s?) in part (a)
d) Going back to >=4, also enforce x=0 and re-solve, again getting a Sensitivity Report.
Draw a diagram, as in part (a).
e) Considering that part (d)'s x=0 constraint was originally an x>=0 constraint,
what is the Lagrange multiplier telling you about being on the x>=0 constraint boundary?
Explain.
f) Formulate part (a) for Wolfram Alpha by asking it for the stationary points of the
Lagrangian function. We'd like to call the Lagrange multiplier "lambda", but it doesn't
usually understand that, so call the Lagrange multiplier "z". Something like this:
stationary points of (some expression involving x, y, z)
Check that you get a reasonable result compared to your answer to part (a)
g) Repeat (f) for the problem in part (d) with the x=0 constraint enforced.
You will need 2 Lagrange multipliers; I suggest you call them z and w.

---------------------------------------------------------
2. Read the article "Economic Congestion Relief Across Multiple
Regions Requires Tradable Physical Flow-Gate Rights" at
http://people.emich.edu/aross15/e/electricity-dualization.pdf
and the questions/responses from a previous year's student, included
below in this homework assignment.
Or, if you prefer some other paper involving Lagrange multipliers, let
me know and you can maybe use that other paper.

Using Excel or Python or Matlab/Octave (I recommend Excel), duplicate
the NLP shown at the bottom-left of page 3 of the document, using
the values shown in Figure 1; you may check your work against the results
shown in Figure 4. You will need to get the Cost functions from Figure 1.
Do you really need to have the q_i variables as official Decision Variables that
you tell Solver about, along with their node-flow constraints?

On Figure 4, draw:
the constraint gradient vectors at optimality,
the constraint gradient vectors multiplied by their Lagrange multipliers,
their vector sum, and
the objective function gradient vector.
If you need to scale them all by some constant to fit on the graph, go ahead.


Also, write down any questions or thoughts you have about the article.
See below for a past student's questions and my answers.

Optional: also solve the partially dualized problem in Excel or other software.
---------------------------------------------------------
Optional:
Consider a chain with N links hanging from (a,b) to (c,d).  Link #i
has a weight of w_i and a length of L_i.
Formulate an NLP that will find the shape that the chain settles into,
by minimizing the total gravitational potential energy.
Part of the fun of this problem is seeing what solutions the solver
settles on that are physically unlikely--if you see any of those, please
include a graph and comment on it.

a) Solve for N=20, with all links having the same weight and same length
(1 gram and sqrt(2) cm).  The left anchor point is at (-10,+10) and the
right anchor point is at (+10,+10).
Show your formulation and a graph of the resulting chain.

b) Re-solve for N=40 links, each half as long as in part (a).  Do this chain
correspond with the chain in part (a) at every second link junction?

c) Let's put a floor at y=5, that the chain may not go below.  Re-solve
part (a).  Comment if anything interesting happens in the final chain.

d) (optional) does either chain (a) or (b) correspond with the continuum
limit of the catenary curve, which is related to hyperbolic-cosine?

e) (optional) Figure out some other cool thing to try, like
changing the weights or lengths of individual links, or other
sorts of floor constraints, or whatever else you dream up. 


---------------------------------------------------------
Questions/Responses on the Congestion electricity article mentioned above:
1) What’s a bilateral transaction?

It's a contract between a producer of electricity and a particular
(large) consumer, like an aluminum refinery or steel recycler. It says
something like "I will supply you with 100 MW from 8am to 6pm on this
particular day, at this particular price." This is as opposed to
generators just selling their power to the general demand pool, or
retail customers like you and me just buying it from our power lines.
The difficulty of bilateral contracts is that the power has to go from a
particular place to another particular place--there isn't much
flexibility, and you hope there's enough capacity on the major
transmission lines.

2) What’s the PTDF (Power Transfer Distribution Factor), and what is it
used for?

It's a
number between 0 and 1, and it quantifies how electric power doesn't
take just a single path through the network. For example, if you're
trying to send 100 MW of power from the nuclear plant Fermi II (in
Monroe County, MI) to Ann Arbor, maybe 80 MW of that power will go along
a transmission line from Romulus to Belleville, and 1 MW will go along a
transmission line from Toledo to Saline, etc. So, the PTDF on the
Romulus-Belleville line is 80% for that transaction, and on the
Toledo-Saline line is 1%. You need to know the PTDFs to figure out the
effect of any proposed change, to avoid overloading any of the lines.

3) What’s counterflow?
Suppose you want to send 100 MW on a line from A to B that has a
capacity of only 95 MW. If you can find someone else who wants to send 5
MW from B to A, then you've found a way to do what you want without
melting the line. That 5 MW from B to A is counterflow.

4) What is meant by the sentence, “Revenues from the sales of
counter-flow rights will subsidize out of merit generation that helps
relieving congestion”?

"Merit order" means "beginning with the cheapest bid and progressing to
more expensive bids" to supply power. It's the ideal, but sometimes we
can't do that. For example, suppose you have a very cheap generator but
the nearby lines have a low capacity; you'll have to run other, more
expensive generators instead ("out of merit generation"="out of
merit-order") to relieve the congestion. That's going to cost extra
money--where will that money come from?  "sales of counter-flow rights".

5) What’s economic dispatch?

Figuring out which generators to run at which output levels to satisfy
demand at the lowest possible cost, without overloading any transmission
lines. Economic=at lowest cost, dispatch=telling people what to do (e.g.
dispatching police cars)

6) Wouldn’t an AC model be better than a DC model, since the
transmission is of AC? (Or is it?)

Almost all transmission is AC, because it's much easier to transform up
to high voltages (hundreds of kilovolts) for long-distance transmission
and then back down to lower voltages (e.g. 4800 volts) for neighborhood
distribution, and ultimately 220 volts for house power. This is done
because the higher the voltage, the lower the current (at the same power
level), and the lower the current, the lower the power lost to
resistance in the lines. This was a big debate between Thomas Edison and
George Westinghouse back in the early days of commercial power; high
voltage (AC) is cheaper than low-voltage (DC) because you lose less
power in transmission, but high voltage is more dangerous (obviously).
Actually, recent advances in electronics have made it possible to
convert DC to AC without huge losses, so there are now high-voltage DC
transmission lines in some places.

Anyway, I agree that AC models are better than DC models. What we're
saying is: if you build your policy on DC models (like CHHP does), you
may be in for a surprise when you implement your policy in the real
world, which uses AC models. When we said "Such possible mishaps can
occur when a control area operator uses an AC model (rather than a DC
approximation)", we didn't mean that AC models are bad. Perhaps we
should have said: "Such possible mishaps can occur when a control area
operator uses a realistic AC model (rather than a DC approximation, as
CHHP proposes)."

7) What other types of problems might partial dualization be valuable in?

As we said in class later, partial dualization = Lagrangian Relaxation,
and it's used in many fields where you have some constraints that are
relatively easy to deal with, and some that are a pain in the neck. You
dualize (=move into the objective function) just the pain-in-the-neck
constraints, and try to find multipliers that are roughly what their
Lagrange multipliers would have been if you had left them as constraints.

8) What’s the difference between real power and reactive power?

If reactive power isn't REAL power, it must be: IMAGINARY (sqrt(-1))
power! I'm actually not kidding.

Resistive loads (toasters, electric heaters or blankets, old incandescent light bulbs)
consume real power, according to the equation V=I*R (voltage=current *
resistance). At any point in the AC power cycle (which happens at 60
cycles per second), the current in a resistive load is directly
proportional to the voltage, with the constant of proportionality being
the resistance. That is, the voltage and current are exactly in phase.

However, things like capacitors and inductors don't behave that way. An
inductor puts up a fight against any change in current. Imagine water
moving in a long pipe: if you all of a sudden increase the water
pressure, it will take a little while for the current (=amount of water
flowing) to change in response, due to the inertia of the existing
water. So, the peak of the current comes a little after the peak of the
pressure (=voltage); current and voltage are not exactly in phase.
Capacitors do the opposite: there, current peaks before voltage. In
either case, the product of Volts*Amps, which is ordinarily the Power (1
volt*1amp = 1 Watt) is not quite the power that's actually being
consumed; the difference is the reactive power. Electric motors often
act somewhat like inductors, so your furnace fan, refrigerator, and air
conditioning all require some reactive power.

Reactive power doesn't do any useful work, but it still has to be
"supplied" to keep the system in balance. Interestingly, the devices
that convert DC from solar panels to AC (which are called "inverters")
can create just about any mix of real and reactive power, which is a
fairly new advance. Also, while compact fluorescent lights use much less
real power than incandescents, they require some reactive power while
incandescents require basically none.

So anyway, because AC involves sine waves being in and out of phase with
each other, it's often easiest to express things in complex-polar
notation, exp(a+j*b) where j=sqrt(-1) because electrical engineers
reserve "i" for current (don't ask me why).

You can see why we chose to ignore reactive power in our paper--it
wouldn't affect the main conclusion.

9) Is "convexifying" a real word? ? If so, is "concavifying" a word?

"Convexifying" is a perfectly cromulent word. Concavifying is only
acceptable in countries south of the equator, where what they're really
doing is convexifying, but it looks upside-down so in 1961 treaty the
two hemispheres agreed that concavifying is an acceptable word (notice
that 1961 looks the same to both hemispheres.)

10) Why does adding a quadratic penalty term convexify the dualized
objective function?

This is the "augmented Lagrangian" method that I mentioned in class the
other day. The ordinary Lagrangian is f(x) + lambda^T * h(x)
and as we've said, it quite often has a saddle point instead of a local
minimum at the site of the
original minimum (to the non-dualized problem). We augment by adding a
term that penalizes us quadratically for any straying from the
constraints. In the simple version, where h(x) is the set of _equality_
constraints (ie h(x) = 0), a measure of how far we are from the
constraints is just h(x) itself. So, our augmented Lagrangian is
f(x) + lambda^T * h(x) + nu * (h(x))^T * h(x)
where the "nu" is a variable that tells us how much to penalize
ourselves for violating the constraints (we will want nu>0, of course).
Since the quadratic part is convex, if we add a big enough version of it
(ie make nu big enough) to the rest of the Lagrangian, we will get a
locally convex function.

11) What does block-separable mean?

The variables can be broken down into subsets, where each subset 
appears in its own little subset of constraints, and not
any of the others.

12) Which of the methods we discussed in class work for problems that
are not convex, but are "locally convex near the feasible region"?

Pretty much all of them, as long as you start close enough to the
optimal solution.


15) What program was used to generate the graphs in the paper? They look
cool.

The graphs were generated with "gnuplot". The non-graph figures were
generated with "xfig", which was originally a unix-only program but now
there are windows versions (winfig and jfig).  If I had it to do again,
I would probably do the graphs in Matlab instead of Gnuplot.

16) What’s an energy adjustment bid?

It's about paying someone
(or quoting a price you'd like to be paid) to adjust the amount of
energy or power you're using, to avoid overwhelming a transmission line.

17) In reference to the statement, "Our example … was not easy to find,"
how long did it take to find it?

I spent a few weeks (or months?) trying to come up with an example by
hand, or even trying to write an optimization problem that would produce
"the optimal counter-example". When that failed, I said "heck, let's
just try random parameters" and that found a counter-example on the 47th
trial. That one was a trivial counter-example (I think it was based on
the values repeating every 2*pi units, which can be dealt with easily),
but within a thousand more trials I think I found the one you see in
this paper. The way the trials went was this:
1. generate cost and transmission parameters randomly
2. solve the original Optimal Power Flow problem, get Lagrange multipliers
3. dualize using those Lagrange multipliers, re-solve, see if new answer
is infeasible for non-dualized problem.
4. repeat at step 1.

I can send you the AMPL code that did this if you like.

18) What does it mean for something to be "distributed via the NERC
process"? (Reference 1)

NERC is like Congress or IEEE standards-setting bodies; they have to
make a proposal of a new standard, put it out there for discussion, see
what people think, etc.   That stuff isn't done in established academic
journals. Originally it was probably done by photocopying and mailing;
these days it's almost certainly web-based.


(PS: if you don't know what "cromulent" means, google it.)